Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 13(1): 1728, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35365602

RESUMO

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.


Assuntos
Aprendizado Profundo , Biologia Computacional , Filogenia , Proteínas , Biologia de Sistemas
2.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34937698

RESUMO

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model's interpretable parameters-sequence length, alphabet size, and assumed interactions between sequence positions-on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.


Assuntos
Epistasia Genética , Aprendizagem/fisiologia , Algoritmos , Modelos Teóricos
3.
Nat Commun ; 12(1): 5225, 2021 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-34471113

RESUMO

Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.


Assuntos
Algoritmos , Redes Neurais de Computação , Bactérias , Proteínas de Fluorescência Verde
4.
Proc Natl Acad Sci U S A ; 118(10)2021 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-33658362

RESUMO

The motion of nanoparticles near surfaces is of fundamental importance in physics, biology, and chemistry. Liquid cell transmission electron microscopy (LCTEM) is a promising technique for studying motion of nanoparticles with high spatial resolution. Yet, the lack of understanding of how the electron beam of the microscope affects the particle motion has held back advancement in using LCTEM for in situ single nanoparticle and macromolecule tracking at interfaces. Here, we experimentally studied the motion of a model system of gold nanoparticles dispersed in water and moving adjacent to the silicon nitride membrane of a commercial LC in a broad range of electron beam dose rates. We find that the nanoparticles exhibit anomalous diffusive behavior modulated by the electron beam dose rate. We characterized the anomalous diffusion of nanoparticles in LCTEM using a convolutional deep neural-network model and canonical statistical tests. The results demonstrate that the nanoparticle motion is governed by fractional Brownian motion at low dose rates, resembling diffusion in a viscoelastic medium, and continuous-time random walk at high dose rates, resembling diffusion on an energy landscape with pinning sites. Both behaviors can be explained by the presence of silanol molecular species on the surface of the silicon nitride membrane and the ionic species in solution formed by radiolysis of water in presence of the electron beam.

5.
Bioinformatics ; 36(Suppl_1): i560-i568, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657417

RESUMO

SUMMARY: We propose a new spectral framework for reliable training, scalable inference and interpretable explanation of the DNA repair outcome following a Cas9 cutting. Our framework, dubbed CRISPRL and, relies on an unexploited observation about the nature of the repair process: the landscape of the DNA repair is highly sparse in the (Walsh-Hadamard) spectral domain. This observation enables our framework to address key shortcomings that limit the interpretability and scaling of current deep-learning-based DNA repair models. In particular, CRISPRL and reduces the time to compute the full DNA repair landscape from a striking 5230 years to 1 week and the sampling complexity from 1012 to 3 million guide RNAs with only a small loss in accuracy (R2R2 ∼ 0.9). Our proposed framework is based on a divide-and-conquer strategy that uses a fast peeling algorithm to learn the DNA repair models. CRISPRL and captures lower-degree features around the cut site, which enrich for short insertions and deletions as well as higher-degree microhomology patterns that enrich for longer deletions. AVAILABILITY AND IMPLEMENTATION: The CRISPRL and software is publicly available at https://github.com/UCBASiCS/CRISPRLand.


Assuntos
Reparo do DNA , Software , Algoritmos
6.
Nat Biotechnol ; 37(9): 1034-1037, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31359007

RESUMO

Understanding of repair outcomes after Cas9-induced DNA cleavage is still limited, especially in primary human cells. We sequence repair outcomes at 1,656 on-target genomic sites in primary human T cells and use these data to train a machine learning model, which we have called CRISPR Repair Outcome (SPROUT). SPROUT accurately predicts the length, probability and sequence of nucleotide insertions and deletions, and will facilitate design of SpCas9 guide RNAs in therapeutically important primary human cells.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes/métodos , RNA Guia de Cinetoplastídeos/genética , Linfócitos T/fisiologia , Linhagem Celular , Regulação da Expressão Gênica , Genoma , Genômica , Humanos , Células-Tronco Pluripotentes Induzidas/fisiologia
7.
Sci Adv ; 2(9): e1600025, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27704040

RESUMO

Early identification of pathogens is essential for limiting development of therapy-resistant pathogens and mitigating infectious disease outbreaks. Most bacterial detection schemes use target-specific probes to differentiate pathogen species, creating time and cost inefficiencies in identifying newly discovered organisms. We present a novel universal microbial diagnostics (UMD) platform to screen for microbial organisms in an infectious sample, using a small number of random DNA probes that are agnostic to the target DNA sequences. Our platform leverages the theory of sparse signal recovery (compressive sensing) to identify the composition of a microbial sample that potentially contains novel or mutant species. We validated the UMD platform in vitro using five random probes to recover 11 pathogenic bacteria. We further demonstrated in silico that UMD can be generalized to screen for common human pathogens in different taxonomy levels. UMD's unorthodox sensing approach opens the door to more efficient and universal molecular diagnostics.


Assuntos
Bactérias/genética , Sondas de DNA/genética , DNA Bacteriano/genética , Infecções/diagnóstico , Bactérias/isolamento & purificação , Bactérias/patogenicidade , DNA Bacteriano/classificação , Humanos , Infecções/genética , Infecções/microbiologia , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...